Sub-lexical Modelling Using a Finite State Transducer Framework1
نویسندگان
چکیده
The finite state transducer (FST) approach [1] has been widely used recently as an effective and flexible framework for speech systems. In this framework, a speech recognizer is represented as the composition of a series of FSTs combining various knowledge sources across sub-lexical and high-level linguistic layers. In this paper, we use this FST framework to explore some sublexical modelling approaches, and propose a hybrid model that combines an ANGIE [2] morpho-phonemic model with a lexiconbased phoneme network model. These sub-lexical models are converted to FST representations and can be conveniently composed to build the recognizer. Our preliminary perplexity experiments show that the proposed hybrid model has the advantage of imposing strong constraints to the in-vocabulary words as well as providing detailed sub-lexical syllabification and morphology analysis of the out-of-vocabulary (OOV) words. Thus it has the potential of offering good performance and can better handle the OOV problem in speech recognition.
منابع مشابه
Context-dependent probabilistic hierarchical sublexical modelling using finite state transducers
This paper describes a unified architecture for integrating sub-lexical models with speech recognition, and a layered framework for context-dependent probabilistic hierarchical sublexical modelling. Previous work [1, 2, 3] has demonstrated the effectiveness of sub-lexical modelling using a core context-free grammar (CFG) augmented with context-dependent probabilistic models. Our major motivatio...
متن کاملSub-lexical modelling using a finite state transducer framework
The finite state transducer (FST) approach [1] has been widely used recently as an effective and flexible framework for speech systems. In this framework, a speech recognizer is represented as the composition of a series of FSTs combining various knowledge sources across sub-lexical and high-level linguistic layers. In this paper, we use this FST framework to explore some sublexical modelling a...
متن کاملKlex: A Finite-State Transducer Lexicon of Korean
This paper describes the implementation and system details of Klex, a finite-state transducer lexicon for the Korean language, developed using XRCE’s Xerox Finite State Tool (XFST). Klex is essentially a transducer network representing the lexicon of the Korean language with the lexical string on the upper side and the inflected surface string on the lower side. Two major applications for Klex ...
متن کاملA Non-deterministic Tokeniser for Finite-State Parsing
This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.
متن کاملUsing Genericity To Create Cutomizable Finite-State Tools
In this article we present the realization of a generic finite-state system. The system has been used to create concrete lexical tools for word form analysis, word form generation, creation and derivation history, and spenchecking. It will also be used to create a finite-state transducer for the recognition of phrases. Producing a finitestate component with the generic system requires little e~...
متن کامل